Scheduling Queries on Tape-resident Data
نویسندگان
چکیده
Advances in storage technology have made near-line tertiary storage a viable alternative for database and data warehouse systems. Tertiary storage systems are employed in cases where secondary storage can not satisfy the data handling requirements or tertiary storage is more cost eeective option. Tertiary storage devices have traditionally been used as archival storage. The new application domains require on-demand retrieval of data from these devices. This paper investigates issues in optimizing I/O time for a query whose data resides on automated tertiary storage containing multiple storage devices. We model the problem as a limited storage parallel two-machine ow-shop scheduling problem with additional constraints. Given a query, we establish an upper bound on the number of storage devices for an optimal I/O schedule and provide experimental proof for it. For queries that access small amounts of data from multiple media, we derive an optimal schedule analytically. For queries that access large amount of data we derive a heuristics based scheduling algorithm using analytically proved results. We also discuss practical aspects of about using the theoretical results in a real environment and demonstrate our applicability of our work using an accurate robotic tape library simulator.
منابع مشابه
Efficiently Scheduling Tape-resident Jobs
Efficiently Scheduling Tape-resident Jobs Jing Shi, Chunxiao Xing, Lizhu Zhou Department of Computer Science and Technology Tsinghua University Beijing 100084, P.R.China [email protected], {xingcx, dcszlz}@tsinghua.edu.cn Tel: +86-10-62789150 Abstract Many large-scale data-intensive applications need to use tape library to manage large data sets, thus it is critical to study the onl...
متن کاملQuery Pre-Execution and Batching in Paradise: A Two-Pronged Approach to the Efficient Processing of Queries on Tape-Resident Raster Images
The focus of the Paradise project [1,2] is to design and implement a scalable database system capable of storing and processing massive data sets such as those produced by NASA’s EOSDIS project. This paper describes extensions to Paradise to handle the execution of queries involving collections of satellite images stored on tertiary storage. Several modifications were made to Paradise in order ...
متن کاملTape-Disk Join Strategies under Disk Contention
Large-scale data warehousing, data mining, and scientific applications require the analysis of terabytes of facts data accumulated over long periods of time. Tape libraries are suitable devices for storing such mass data. The online analytical processing (OLAP) of this data typically leads to long-running aggregation queries joining the tape-resident facts relation with disk-resident dimension ...
متن کاملScheduling Queries for Tape-Resident Data
Tertiary storage systems are used when secondary storage can not satisfy the data storage requirements and/or it is a more cost effective option. The new application domains require on-demand retrieval of data from these devices. This paper investigates issues in optimizing I/O time for a query whose data resides on automated tertiary storage containing multiple storage devices.
متن کاملAn Architecture for Using Tertiary Storage in a Data Warehouse
In this paper, we present an architecture for a data warehouse that provides convenient, flexible, and high-performance access to tape-resident data. Data warehouses allow an organization to store and analyze operational data. Many data warehouses (e.g., for telecommunications) create very large data sets that can be economically stored only on tertiary storage. At the same time, data analysts ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000